Statistical Mechanics Approach to Sparse Noise 

Denoising 

Mikko Vehkapera 1,2 , Yoshiyuki Kabashima 3 , Saikat Chatterjee 1 
X KTH Royal Institute of Technology and the ACCESS Linnaeus Center, SE-100 44, Stockholm, Sweden 
2 Aalto University, P.O. Box 1 1000, FI-00076 AALTO, Finland 
3 Tokyo Institute of Technology, Yokohama, 226-8502, Japan 



E-mails: mikkov@kth.se, kaba@dis.titech.ac.jp sach@kth.se 



Abstract — Reconstruction fidelity of sparse signals contami- 
nated by sparse noise is considered. Statistical mechanics inspired 
tools are used to show that the i'l-norm based convex optimization 
algorithm exhibits a phase transition between the possibility of 
perfect and imperfect reconstruction. Conditions characterizing 
this threshold are derived and the mean square error of the 
estimate is obtained for the case when perfect reconstruction is 
not possible. Detailed calculations are provided to expose the 
mathematical tools to a wide audience. 

Index Terms — sparse signals and noise, replica method, statis- 
tical mechanical analysis 

I. Introduction 

Sparse signal estimation for linear underdetermined systems 
has attracted wide interest in signal processing community 
during the recent years. This is not surprising since the 
general class of sparse problems is encountered in many 
applications, such as, linear regression (TJ, multimedia |2|, 
and compressive sampling (CS) Q, |5j, to name just a 

few. 

The present paper considers a CS setup where the sparse 
vector x E R N is observed via noisy linear measurements 

y = Ax + w, (1) 

where A £ R MxN represents the compressive (M < N) 
sampling system and y G M M is the observed vector. The 
measurement errors are captured by the additive noise vector 
w 6 M. M . The task is then to reconstruct x from y, given A. 
Typically, detailed information about the statistics of x and 
w is not available to the system designer apart from some 
general knowledge, such as, "signal and noise are sparse". 

A prominent approach for finding a sparse solution to ([T| 
is by solving a (convex) optimization problem of the form 

x x = argmin {C y , A {x) + A||cc||i} , (2) 

where ||cc||i = J2 n \ Xn \ denotes the standard £i-norm. The 
non-negative cost function C v ,a(x) that may depend on the 
realizations of y and A is typically chosen so that the solution 
to |2) can be obtained using standard convex optimization tools 
like cvx [6|. In addition to the choice of C y a{x), the solution 
also depends on the regularization parameter A. In general, 
finding the optimal value of A is not a trivial task. 



For the case of (dense) Gaussian noise, the standard ap- 
proach is to set Cy : A{x) = Ax|||, reducing |2]) to the so- 
called LASSO estimator (7). It is well-known that for non-zero 
noise variance, the solution obtained through LASSO is not 
exact. However, if the noise has some structure, like sparsity, 
perfect reconstruction could again be feasible. Indeed, in many 
applications the noise w can be considered to be sparse, some 
examples of which are: impulsive noise (8), salt-and-pepper 
noise in an image, a sensor scenario where few measurements 
are corrupted but the other ones are good [9], and dictionary 
learning with sparse noise flO) . 

Inspired by the above practical examples, let us consider a 
setup where both x and w are sparse. To take into account 
the sparseness of noise, we set 

C y , A (x) = \\y- Ax\\ u (3) 

and note that this choice makes |2]l a convex optimization 
problem. We then ask the following questions: 

1) Given that the signal and noise are sparse with some 
prior distributions (that we do not know), what values 
of M and N allow perfect reconstruction of xl 

2) What is the mean square error (MSE) of the sparse 
estimate of x outside of this region? 

We answer these questions in the large system limit (LSL) by 
using mathematical tools developed in statistical mechanics. 
The key technique is the replica tricl^ that can be used to 
assess the free energy of a disordered system ||TTJ . This is 
particularly useful in equilibrium statistical mechanics since 
the large-scale behavior of the system can be inferred from it. 
Similar notions have been made in information theory fl2|- 
(15J and in CS fT6)-fT9), where quantities like mutual infor- 
mation and MSE play the role of thermodynamic variables. 

II. Problem Formulation and Methods 

Consider the set of noisy measurements ([1} and assume that 
both the signal and the noise are sparse random vectors (RVs). 
We define a parametrized mixture distribution 

p(z; p,a 2 ) = {l-p)S(z)+pg z (0,a 3 ), (4) 

1 The most serious drawback of the replica method is that some of its steps 
are still lacking formal proof. Hence, it can be considered to be at most a 
"semi-rigorous" analytical tool. Regardless, the replica trick is routinely used 
in equilibrium statistical mechanics and its predictions are often verified by 
experiments. We shall also demonstrate this later in the paper. 



where g z (ji, a 2 ) = e~( z ~^ 2 / 2(y2 /^/2-kcj 2 , and let the elements 
of x (resp. w) be independently and identically distributed 
(IID) according to p(x; p x ,a\) (resp. p(w; p w ,o%,)). Note 
that p e [0, 1] in Q gives the fraction of non-zero Gaussian 
elements in the vector whereas a 2 is the variance of them. 
The measurement process is taken to be random so that the 
elements of A are IID with density]^] (^(0, 1/N). For future 
convenience, we denote the ratio between the number of 
observables and the unknown parameters by a = M/N. To 
enable the use of statistical mechanics tools, we next write the 
original problem in a probabilistic framework. 

Let us consider the optimization problem |2} with the Ni- 
cest ([3]). Assume the system is in the LSL M, N — > oo, 
where the compression ratio a = M/N and the density of the 
signal and noise p x , p w remain constant. Let the postulated 
prior of x be proportional to the Laplace distribution, namely, 
Q/3,x( x ) oc e — ^IHI 1 , where /3 > (inverse temperature 
in statistical mechanics). The postulated distribution of the 
measurement process has the same form, that is, qp{y \ 
A, x) oc e-P\\v- Ax \h ^ and the (mismatched) conditional mean 
estimator of x reads by definition 

^ ;A ' > < 3 = Zp(y A- A) f xq P( y I A , x )ll3^{x)dx, (5) 

where Z p (y,A;X) = j e-^-^^+^^dx. Then, the 
zero temperature estimate X\ = (x; X)b-hx>> is the solution to 
the original optimization problem defined by (|2]i and (|3). 

A. Replica Method 

The key for finding the statistical properties of the recon- 
struction |5]l is the relatively innocent looking normalization 
factor Zp(y, A; A), that acts as the partition function of 
the system. Based on the standard approach in statistical 
mechanics, our goal should then be to assess the normalized 
free energy fp(y, A; A) = lnZp(y, A; A), when N -> oo 
and obtain the desired statistical properties from it. This 
formulation is, however, problematic since / depends on the 
observations y and the measurement process A. A way around 
is to consider the averaged quantity //3(A) = Efp(y, A; X) 
instead. Unfortunately, the problem of computing an expecta- 
tion over a logarithm is in general very difficult. Hence, we 
reformulate fp(X) in the zero temperature limit as 



(6) 



and remark that so-far our treatment has been rigorous if the 
normalized free energy exists in the LSL (we do not dwell on 
this technical point due to space constraints). Unfortunately, 
obtaining an expression for Q is still difficult. This is when 
the replica trick comes in handy. 

Replica trick. Consider the normalized free energy in |6) and 
assume that we can change the order of the limits. Postulate 

2 It is a common practice in CS to normalize the columns of A instead of 
the rows. The former can be obtained here by redefining the signal power to 
be aa^. . This, however, has no effect on the perfect recovery conditions. 



further that inside the logarithm we may write 

[Zp(y,A;X)} u = / J] e -my-^ a h+^ a h) dx ^ (7) 
J 0=1 

and take the limit u — > + outside of it, after we have 
calculated the expectations for integer u. 

The most debatable step in the replica trick is the assump- 
tion that the variable u (number of replicas) can be first 
treated as a non-negative integer and then extended to the 
set of real numbers. There is no rigorous mathematical proof 
that guarantees this can be done. Numerous results both in 
physics and engineering show, however, that the predictions 
of the replica method tend to be accurate when compared to 
numerical experiments. For the rest of the paper we assume 
thus that the replica trick is valid and verify its predictions in 
the end using Monte Carlo simulations. 

The general scheme for the replica analysis consists of first 
assessing Q using the above replica trick and then identify 
the parameters that describe the MSE of the reconstruction. 
Finally, requiring that the MSE vanishes provides the threshold 
for perfect recovery. The next section reports the outcomes 
of the analysis and the last part of the paper is devoted for 
describing how the calculations were carried out. 

III. Results 

Let Q denote the standard Q-function and define two 
functions 



s(x) = - 
r x {h) = X 



a[l-2Q(x)] 



(8) 
(9) 



that take positive real arguments. Then, under the (technical) 
assumption that the extrema of the free energy falls within 
the replica symmetric ansatz (see, e.g., p2)-p4) for further 



discussion), the following results are derived in Section IV 



Proposition 1. Fix A, a, p x , p w and let the variances a 2 and 
erj be finite and greater than zero. Then, the threshold for the 
perfect reconstruction, mse — > 0, is given by the solution of 



A 



fa(A 2 +x)-2(l-fa)r A (x) 
[2(l-p x )Q(X/Vx)+Px] 2 ' 



(10) 



X = a(l- Pw)< A 



1-2Q 



-1/(2A) 



-2Q 



1 



ap u 



(11) 



that satisfies the condition 
a(l-Aj[l-2Q(^=)] = {1- Px )2Q(-j^ +p x . (12) 



A' 

Proposition 2. Let the system be outside of the perfect 
reconstruction phase given by Proposition [7J The MSE of the 



mse = p x a 2 x - ^a 2 x p x Q 



sparse signal estimate obtained with |2} and |3]l is then 

A 

. VX + °l m2 > 
-2m- 2 [(1 - Px )r x (x) + Px r x (x + o 2 x rn 2 )\ , (13) 

where the required parameters can be obtained by solving 

(14) 



(15) 




■ (16) 



The results imply that in the LSL the reconstruction has two 
phases; one where the MSE of the reconstruction vanishes 
and another where the estimate is noisy. The threshold is 
characterized by the parameters {A, a, p x , p w } implying that 
the condition for perfect reconstruction is independent of the 
signal and noise variances a 2 and ct^ , respectively. Outside of 
the perfect reconstruction phase, however, the signal to noise 
ratio of the system has an impact on the MSE. 

Mean square error predicted by Proposition [2] is shown in 
Fig.[Ta] Numerical experiments obtained with cvx |6| are also 
given. Below the thresholds p x = 0.0770 and p x = 0.1030 
for A = 1 and A = optimal, respectively, the MSE of the 



reconstruction vanishes. Figure lb shows the effect of A on 
the perfect recovery threshold given in Proposition [T] Here 
Pw — Spx, where S = 1/5,1/10,1/50. For given p x we 
find the critical a that admits perfect reconstruction, so that 
the MSE vanishes for the set of parameters that lie above 
the selected curve. The results demonstrate that the choice 
of parameter A has a significant impact on the performance. 
Note that optimization of A with simulations is extremely time 
consuming, while it is easy to do by using Proposition [T] 

IV. Replica Analysis 

In this section a sketch of derivation is given for Proposi- 
tions [T] and [2] Throughout the rest of the paper, the replica 
trick given in Section II-A is assumed to be valid. With this 
in mind, recall (|7]i and denote v a = A(x° — x a ). The term 
inside In in |6]) can then be written as 



u 



-/3A||x a l| lda ,a 



-/3\\v a -\-w\\ i 



(17) 



where x° has IID elements drawn according to p(x; p x ,a 2 ). 
We first concentrate on evaluating the latter term T U: p(X) = 
E A,™ IlLi e- fj ^ a+w ^, for a fixed set X = {x a }™i . 

Since A has IID elements with density <7a(0, 1/iV), condi- 
tioned on X the vectors {v a } tend to jointly Gaussian RVs 
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(a) MSE of reconstruction for a = 1/2, p w = 0.1 and crj = crj,. Lines for 
replica analysis based results and markers for simulations. 
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(b) Replica method based critical condition for perfect reconstruction. Solid 
lines for A = 1 and dashed lines for optimal A. 

Fig. 1. Reconstruction performance vs. signal density p x . 

by the central limit theorem as N —> oo. More precisely, if 
v e K uN is formed by stacking {f a }" =1 then v is a zero- 
mean Gaussian RV with covariance matrix R = Eavv t . 
We write this as d ~ g v (0,R) and note that the (a,6)th 
(a,b = 0,1, ... ,u) block of R is given by 

R a ,b = [QoO - (QaO + Qob) + Qafcj-^M, (18) 

where = N- 1 (x a ■ x h ). For later use, let Q G 

R («+i)x(«+i) be com p OS ed of the elements {Q ab }. Thus, for 
large N, 



g v (0,R) exp 



(3 



a=l 



W 1 



dv, 
(19) 



where we have omitted terms that vanish as N —> oo 

In the large system limit of N — > oo, Laplace's (the saddle 
point) method with respect to R yields the exact assessment 
of N' 1 In E{[Zp{y, A; A)]"} for Vn e N and V/3 > 0. We 
here assume that the dominant saddle point in the assessment 
is invariant under any permutation of the replica indexes 
a = 1,2, ... ,u, which is often termed the replica symmetric 
(RS) ansatz and is characterized as Q a o — Qob = m , Qaa = 
Q,a = l,...,u and Q a b — q for a ^ b G {!,..., u] in 



the current case. This allows us to express v a in (JT9J as 
ZaVQ ~ Q + ty/p ~ 2m + q e R M , which means that 



v- = z a vy - q + Wp — 2m + q e 
I U ^{X) is proportional to 



-P\z^^+t^p-2rn+q+w\-\z 2 j. 



,1/ 



(20) 



where E{ • • • } = J( • • • )p(w; p w , a^,)dwT)t and Di = 
dte~* / 2 /V2tt. Since we are interested in the zero temperature 
solution j3 —> oo, Laplace's method for the integral w.r.t. z 
implies 



Zu,/3(Q) OC 



Ee -uM(t,™;Q) 



vJV 



(21) 



We write next the exponential term in ( |20| ) in a slightly 
different form by denoting % = f3(Q — q) > and 
f(t, w; Q) — ty/p — 2m + Q + w. We also use the fact that 
p — 2m + q = p — 2m + Q — x/ft ~> P ~~ 2m + Q for any 
finite x- The Laplace's method requires then that 

(22) 



tp(t,w;Q) = min jjz^x + f(t,w, Q)\ + 



Examining the critical points of tp for a fixed set {t, w, x, Q} 
shows that the minimizing z gives 

n ' ' W) \\f(t,w;Q)\- X /2, \f(t,w;Q)\> X . 

The next task is to average l u ^(w; Q) over the set X. The 
expectation w.r.t. Q can be carried out under the RS ansatz 
by defining first the probability weight 



/u 
dx°p{x Q ) J[ (dx a e 



J] S(x a -x b -NQ ab ), (24) 



0<a<6<u 



and integrating then w.r.t. the measure fi(Q). Under the RS 
ansatz ( f2"4"| i has the same form as in p6| , fT7) so we skip the 
derivation here due to space constraints and arrive straight at 
the expression 



fi(Q) 



dQ exp 



/3N 



( QQ-xx 



V 2 1 

-y(xx-/3xQ) + ^logM„(Q; /3,A) 



(25) 



where Q is a short-hand for {x, Q, m}. We also have the 
moment generating function of the elements of {x a }^ =0 



M U (Q; 0, A) = (1 — Px )E z e-^ z ^> ® 

where E z ( ■ ■ ■ ) denotes /( • ■ • )Dz and <fi satisfies 

'-(\h\- A) 2 /(2Q), if\h\>\ 
0, if \h\ < A. 



Mh; Q) 



(26) 



(27) 



The final form of fi(Q) seems undoubtedly rather cryptic so 
we give here a few hints how it can be obtained (more details 
in fT6| , fT7)). The first task in obtaining ( p5| is to write the 
Dirac's delta functions using (inverse) Fourier transform and 
integrating over x° with the help of the Gaussian integral 



2 / 2+bx dx = 



1 



: exp 



(28) 



Then ( |28| ) is used right-to-left to decouple the replicated terms 
{x a } and the average over them is obtained using the saddle 
point method as — >• oo. These last two steps give arise 
to ( p7| and the integrals in ( p6) . Rest of the terms in ( |2"5j ) 
arise essentially from the (inverse) Fourier transform of the 
Dirac's delta functions where the hatted variables represent 
scaled transform domain variables. 

Combining pTj ) and ( |25j ) yields an expression for ( fTT) as 



(29) 



E{[^(»,A;A)Ha J dQl u AQ)KQ) 
+u ® Q 2 XX + y (xx - /?xQ) +^ log M U (Q; p, A) 



For the integration w.r.t. Q and Q we use again the saddle 
point method as N — >• oo. Note that we have then by the 
law of large numbers p —> cr 2 p x as well (see (pO)). Thus, the 
replica symmetric expression for (|6]i reads 



f (\\ 4. J W ^ 

/ re (A) = extr < — — - mm 



(30) 



- lim — — 
u->o+ cm 



log Ee-^*(*' u "«) + i log /3, A) 



where we used the fact that the order of extremization 
extr{ • • • } w.r.t. {x, m, Q, x, OT, Q} and the partial derivative 
w.r.t. u can be exchanged p2| . Solving the remaining deriva- 
tives finally gives the form 

/re (A) = extr | ~ mm - aE t iW ip(t, w; x, m, Q) 



-(1 - Px)E z <f>\(z\/i; 0) - PxE z <P\(z\/x + c^m 2 ; Q 



(31) 



in the limit u — > + . 

We have now managed to write the normalized free energy 
under RS ansatz as the solution of an extremization problem 
that has a couple of expectations inside. Let us first consider 
the derivatives w.r.t. the "hat-variables" {%, m, Q}. Since ip 
does not depend on them, the only non-trivial part is to solve 
the expectations and partial derivatives on the second line in 



Lemma 1. Let h be a real positive (function) independent of 
z € HL Then, for positive real parameters Q and A we have 



£ z 4>x(zVh- Q) = Q- 1 rx{h) 1 



(32) 
(33) 



where r\(h) is given in Q. 

Using the above results, the normalized free energy reads 

/rs(A) = cxtr | - mm - a£ tiW %l>(t, w; x, m, Q) 

+Q- 1 [(1 - p x )r x (x) + Px rx(x + <J 2 X ™ 2 )]}, (34) 
where \ is given in ( fl4] > and 

A ^ (35) 



Q = -2Q- 2 [(1 - p x )r x (x) + Px r x ( X + o 2 x m 2 )\ . (36) 

Lemma 2. Let f(t^/a, x\, . . . , Xk) be a real-valued function, 
where a > and {t, x±, . . . , Xf.} are independent random 
variables that do not depend on a. Then, 

^- j f(t\/a, xt,..., x k )Dt = ^ / f'{t\/a, xi,..., x fe )Dt, 

(37) 

where /"(■••) is the 2nd order partial derivative w.r.t. first 
argument. Also, denoting the indicator function !{•■■}, 



f l{\t\ > a}Bt = 2Q(a), 
J l{\t\ < a}Dt = 1 - 2Q(a), 
J t 2 t{\t\ < a}Dt = 1 - 2Q(a) - 



2a 



-a 2 /2 



(38) 
(39) 
(40) 



where the integrals are over the set of real numbers. 

Using ( |37| ) for the partial derivatives w.r.t. m and Q, and 
then ( |38| ) and ( |40| > for the remaining integrals shows that 
Q = m as given in ( fT5j ). Furthermore, mse = <J 2 p x — 2m + Q 
reduces to ( p"3j ) and gives the MSE of the reconstruction |16|, 
Similarly, from the derivative of x an d PS) - ( |40] > one 
gets ( fT6] >. Thus, we have obtained a full description of the free 
energy under the RS ansatz in terms of six parameters. More 
importantly, we obtained as a by product the MSE behavior 
of the convex optimization problem based on |2| and ([3J, 
finishing the proof of Proposition [2] 

To derive Proposition [T] we first require that mse — > 0, 
which implies p x (T 2 = m = Q and m = Q — > oo x ~ * 

0. For a non-trivial solution we should have x G 0(1) and 
< A < oo. Using the Taylor expansion of the Q-function 
and exponential function near zero, the proposition follows 
after some algebra (details omitted due to space constraints). 
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